gradient function
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > France (0.04)
- (10 more...)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (17 more...)
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks
Patel, Vivak, Varner, Christian
The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.
- North America > United States > Wisconsin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Hampshire > Hillsborough County > Nashua (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
A Differentiable Approach to Multi-scale Brain Modeling
Wang, Chaoming, Lyu, Muyang, Zhang, Tianqiu, He, Sichao, Wu, Si
We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to electrophysiological data. On the network level, we incorporate connectomic data to construct biologically constrained network models. Finally, to replicate animal behavior, we train these models on cognitive tasks using gradient-based learning rules. Experiments demonstrate that our approach achieves superior performance and speed in fitting generalized leaky integrate-and-fire and Hodgkin-Huxley single neuron models. Additionally, training a biologically-informed network of excitatory and inhibitory spiking neurons on working memory tasks successfully replicates observed neural activity and synaptic weight distributions. Overall, our differentiable multi-scale simulation approach offers a promising tool to bridge neuroscience data across electrophysiological, anatomical, and behavioral scales.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks
Pan, Yuhao, Wang, Xiucheng, Cheng, Nan, Qiu, Qi
With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance comparable to deep neural networks, have garnered widespread attention. To reduce the energy consumption of practical applications of reinforcement learning, researchers have successively proposed the Pop-SAN and MDC-SAN algorithms. Nonetheless, these algorithms use rectangular functions to approximate the spike network during the training process, resulting in low sensitivity, thus indicating room for improvement in the training effectiveness of SNN. Based on this, we propose a trapezoidal approximation gradient method to replace the spike network, which not only preserves the original stable learning state but also enhances the model's adaptability and response sensitivity under various signal dynamics. Simulation results show that the improved algorithm, using the trapezoidal approximation gradient to replace the spike network, achieves better convergence speed and performance compared to the original algorithm and demonstrates good training stability.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.42)
Gradient Networks
Chaudhari, Shreyas, Pranav, Srinivasa, Moura, José M. F.
Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in optimization, generative modeling, and optimal transport. This paper introduces gradient networks (GradNets): novel neural network architectures that parameterize gradients of various function classes. GradNets exhibit specialized architectural constraints that ensure correspondence to gradient functions. We provide a comprehensive GradNet design framework that includes methods for transforming GradNets into monotone gradient networks (mGradNets), which are guaranteed to represent gradients of convex functions. We establish the approximation capabilities of the proposed GradNet and mGradNet. Our results demonstrate that these networks universally approximate the gradients of (convex) functions. Furthermore, these networks can be customized to correspond to specific spaces of (monotone) gradient functions, including gradients of transformed sums of (convex) ridge functions. Our analysis leads to two distinct GradNet architectures, GradNet-C and GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and mGradNet-M. Our empirical results show that these architectures offer efficient parameterizations and outperform popular methods in gradient field learning tasks.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Storchastic: A Framework for General Stochastic Automatic Differentiation
van Krieken, Emile, Tomczak, Jakub M., Teije, Annette ten
Modelers use automatic differentiation of computation graphs to implement complex Deep Learning models without defining gradient computations. However, modelers often use sampling methods to estimate intractable expectations such as in Reinforcement Learning and Variational Inference. Current methods for estimating gradients through these sampling steps are limited: They are either only applicable to continuous random variables and differentiable functions, or can only use simple but high variance score-function estimators. To overcome these limitations, we introduce Storchastic, a new framework for automatic differentiation of stochastic computation graphs. Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Furthermore, Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates. Finally, we implement Storchastic as a PyTorch library.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (17 more...)
Symmetrical Gaussian Error Linear Units (SGELUs)
In this paper, a novel neural network activation function, called Symmetrical Gaussian Error Linear Unit (SGELU), is proposed to obtain high performance. It is achieved by effectively integrating the property of the stochastic regularizer in the Gaussian Error Linear Unit (GELU) with the symmetrical characteristics. Combining with these two merits, the proposed unit introduces the capability of the bidirection convergence to successfully optimize the network without the gradient diminishing problem. The evaluations of SGELU against GELU and Linearly Scaled Hyperbolic Tangent (LiSHT) have been carried out on MNIST classification and MNIST auto-encoder, which provide great validations in terms of the performance, the convergence rate among these applications.
Regularized Ensembles and Transferability in Adversarial Learning
Chen, Yifan, Vorobeychik, Yevgeniy
Despite the considerable success of convolutional neural networks in a broad array of domains, recent research has shown these to be vulnerable to small adversarial perturbations, commonly known as adversarial examples. Moreover, such examples have shown to be remarkably portable, or transferable, from one model to another, enabling highly successful black-box attacks. We explore this issue of transferability and robustness from two dimensions: first, considering the impact of conventional $l_p$ regularization as well as replacing the top layer with a linear support vector machine (SVM), and second, the value of combining regularized models into an ensemble. We show that models trained with different regularizers present barriers to transferability, as does partial information about the models comprising the ensemble.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
Scalable kernel-based variable selection with sparsistency
He, Xin, Wang, Junhui, Lv, Shaogao
Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.